Skip to content

[VPlan] Don't fold UDiv in replicate regions.#175460

Merged
fhahn merged 3 commits intollvm:mainfrom
fhahn:vplan-dont-fold-udiv-in-replicate-region
Jan 12, 2026
Merged

[VPlan] Don't fold UDiv in replicate regions.#175460
fhahn merged 3 commits intollvm:mainfrom
fhahn:vplan-dont-fold-udiv-in-replicate-region

Conversation

@fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 11, 2026

The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported.

Fixes #175295.

The UDiv fold added in d12e993 (llvm#174581) is currently also applied to
replicate regions, which means we may end up with VPInstructions in
replicate regions, which is currently nots supported.

Fixes llvm#175295.
@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2026

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported.

Fixes #175295.


Full diff: https://github.com/llvm/llvm-project/pull/175460.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+6-1)
  • (added) llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll (+310)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a430f13f0c9c0..19c66e1efb956 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1352,7 +1352,12 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
         {A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())},
         *cast<VPRecipeWithIRFlags>(Def), Def->getDebugLoc()));
 
-  if (match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) && APC->isPowerOf2())
+  // Don't convert udiv to lshr inside a replicate region, as VPInstructions are
+  // not allowed in them.
+  const VPRegionBlock *ParentRegion = Def->getParent()->getParent();
+  bool IsInReplicateRegion = ParentRegion && ParentRegion->isReplicator();
+  if (!IsInReplicateRegion && match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) &&
+      APC->isPowerOf2())
     return Def->replaceAllUsesWith(Builder.createNaryOp(
         Instruction::LShr,
         {A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())}, {},
diff --git a/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
new file mode 100644
index 0000000000000..45f211b9b5284
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
@@ -0,0 +1,310 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Test case for https://github.com/llvm/llvm-project/issues/175295.
+define i32 @simplify_udiv_1_in_replicate_region(i8 %arg, ptr %src) {
+; CHECK-LABEL: define i32 @simplify_udiv_1_in_replicate_region(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i8> poison, i8 [[TMP0]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[INDEX]]
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 4
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP3]], align 1
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT:    [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> zeroinitializer, <4 x i8> [[TMP1]]
+; CHECK-NEXT:    [[TMP5:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT:    [[TMP6:%.*]] = zext <4 x i1> [[TMP5]] to <4 x i32>
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP6]], i32 3
+; CHECK-NEXT:    br label %[[SCALAR_PH:.*]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT:    [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT:    [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT:    [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT:    br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT:    [[UDIV:%.*]] = udiv i8 [[OR]], 1
+; CHECK-NEXT:    br label %[[LATCH]]
+; CHECK:       [[LATCH]]:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT:    [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 0
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+  %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+  %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+  %l = load i8, ptr %gep.src
+  %c = icmp eq i8 %l, 0
+  br i1 %c, label %latch, label %then
+
+then:
+  %or = or i8 %arg, 1
+  %udiv = udiv i8 %or, 1
+  br label %latch
+
+latch:
+  %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+  %cmp = icmp ne i8 %phi, 0
+  %zext = zext i1 %cmp to i32
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv, 18
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 0
+}
+
+define i32 @simplify_udiv_4_in_replicate_region2(i8 %arg, ptr noalias %src, ptr %dst) {
+; CHECK-LABEL: define i32 @simplify_udiv_4_in_replicate_region2(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr noalias [[SRC:%.*]], ptr [[DST:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE29:.*]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = add i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP3:%.*]] = add i32 [[INDEX]], 3
+; CHECK-NEXT:    [[TMP4:%.*]] = add i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[INDEX]], 5
+; CHECK-NEXT:    [[TMP6:%.*]] = add i32 [[INDEX]], 6
+; CHECK-NEXT:    [[TMP7:%.*]] = add i32 [[INDEX]], 7
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[TMP0]]
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i64 4
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP8]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i8>, ptr [[TMP9]], align 1
+; CHECK-NEXT:    [[TMP10:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT:    [[TMP11:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD1]], zeroinitializer
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP0]]
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP2]]
+; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP3]]
+; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP12]], i32 0
+; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <4 x ptr> [[TMP16]], ptr [[TMP13]], i32 1
+; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <4 x ptr> [[TMP17]], ptr [[TMP14]], i32 2
+; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <4 x ptr> [[TMP18]], ptr [[TMP15]], i32 3
+; CHECK-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP4]]
+; CHECK-NEXT:    [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP5]]
+; CHECK-NEXT:    [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP6]]
+; CHECK-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP7]]
+; CHECK-NEXT:    [[TMP24:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP20]], i32 0
+; CHECK-NEXT:    [[TMP25:%.*]] = insertelement <4 x ptr> [[TMP24]], ptr [[TMP21]], i32 1
+; CHECK-NEXT:    [[TMP26:%.*]] = insertelement <4 x ptr> [[TMP25]], ptr [[TMP22]], i32 2
+; CHECK-NEXT:    [[TMP27:%.*]] = insertelement <4 x ptr> [[TMP26]], ptr [[TMP23]], i32 3
+; CHECK-NEXT:    [[TMP28:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT:    br i1 [[TMP28]], label %[[PRED_LOAD_IF:.*]], label %[[PRED_LOAD_CONTINUE:.*]]
+; CHECK:       [[PRED_LOAD_IF]]:
+; CHECK-NEXT:    [[TMP29:%.*]] = load i8, ptr [[TMP12]], align 1
+; CHECK-NEXT:    [[TMP30:%.*]] = insertelement <4 x i8> poison, i8 [[TMP29]], i32 0
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE]]
+; CHECK:       [[PRED_LOAD_CONTINUE]]:
+; CHECK-NEXT:    [[TMP31:%.*]] = phi <4 x i8> [ poison, %[[VECTOR_BODY]] ], [ [[TMP30]], %[[PRED_LOAD_IF]] ]
+; CHECK-NEXT:    [[TMP32:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT:    br i1 [[TMP32]], label %[[PRED_LOAD_IF2:.*]], label %[[PRED_LOAD_CONTINUE3:.*]]
+; CHECK:       [[PRED_LOAD_IF2]]:
+; CHECK-NEXT:    [[TMP33:%.*]] = load i8, ptr [[TMP13]], align 1
+; CHECK-NEXT:    [[TMP34:%.*]] = insertelement <4 x i8> [[TMP31]], i8 [[TMP33]], i32 1
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE3]]
+; CHECK:       [[PRED_LOAD_CONTINUE3]]:
+; CHECK-NEXT:    [[TMP35:%.*]] = phi <4 x i8> [ [[TMP31]], %[[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], %[[PRED_LOAD_IF2]] ]
+; CHECK-NEXT:    [[TMP36:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT:    br i1 [[TMP36]], label %[[PRED_LOAD_IF4:.*]], label %[[PRED_LOAD_CONTINUE5:.*]]
+; CHECK:       [[PRED_LOAD_IF4]]:
+; CHECK-NEXT:    [[TMP37:%.*]] = load i8, ptr [[TMP14]], align 1
+; CHECK-NEXT:    [[TMP38:%.*]] = insertelement <4 x i8> [[TMP35]], i8 [[TMP37]], i32 2
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE5]]
+; CHECK:       [[PRED_LOAD_CONTINUE5]]:
+; CHECK-NEXT:    [[TMP39:%.*]] = phi <4 x i8> [ [[TMP35]], %[[PRED_LOAD_CONTINUE3]] ], [ [[TMP38]], %[[PRED_LOAD_IF4]] ]
+; CHECK-NEXT:    [[TMP40:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT:    br i1 [[TMP40]], label %[[PRED_LOAD_IF6:.*]], label %[[PRED_LOAD_CONTINUE7:.*]]
+; CHECK:       [[PRED_LOAD_IF6]]:
+; CHECK-NEXT:    [[TMP41:%.*]] = load i8, ptr [[TMP15]], align 1
+; CHECK-NEXT:    [[TMP42:%.*]] = insertelement <4 x i8> [[TMP39]], i8 [[TMP41]], i32 3
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE7]]
+; CHECK:       [[PRED_LOAD_CONTINUE7]]:
+; CHECK-NEXT:    [[TMP43:%.*]] = phi <4 x i8> [ [[TMP39]], %[[PRED_LOAD_CONTINUE5]] ], [ [[TMP42]], %[[PRED_LOAD_IF6]] ]
+; CHECK-NEXT:    [[TMP44:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT:    br i1 [[TMP44]], label %[[PRED_LOAD_IF8:.*]], label %[[PRED_LOAD_CONTINUE9:.*]]
+; CHECK:       [[PRED_LOAD_IF8]]:
+; CHECK-NEXT:    [[TMP45:%.*]] = load i8, ptr [[TMP20]], align 1
+; CHECK-NEXT:    [[TMP46:%.*]] = insertelement <4 x i8> poison, i8 [[TMP45]], i32 0
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE9]]
+; CHECK:       [[PRED_LOAD_CONTINUE9]]:
+; CHECK-NEXT:    [[TMP47:%.*]] = phi <4 x i8> [ poison, %[[PRED_LOAD_CONTINUE7]] ], [ [[TMP46]], %[[PRED_LOAD_IF8]] ]
+; CHECK-NEXT:    [[TMP48:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT:    br i1 [[TMP48]], label %[[PRED_LOAD_IF10:.*]], label %[[PRED_LOAD_CONTINUE11:.*]]
+; CHECK:       [[PRED_LOAD_IF10]]:
+; CHECK-NEXT:    [[TMP49:%.*]] = load i8, ptr [[TMP21]], align 1
+; CHECK-NEXT:    [[TMP50:%.*]] = insertelement <4 x i8> [[TMP47]], i8 [[TMP49]], i32 1
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE11]]
+; CHECK:       [[PRED_LOAD_CONTINUE11]]:
+; CHECK-NEXT:    [[TMP51:%.*]] = phi <4 x i8> [ [[TMP47]], %[[PRED_LOAD_CONTINUE9]] ], [ [[TMP50]], %[[PRED_LOAD_IF10]] ]
+; CHECK-NEXT:    [[TMP52:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT:    br i1 [[TMP52]], label %[[PRED_LOAD_IF12:.*]], label %[[PRED_LOAD_CONTINUE13:.*]]
+; CHECK:       [[PRED_LOAD_IF12]]:
+; CHECK-NEXT:    [[TMP53:%.*]] = load i8, ptr [[TMP22]], align 1
+; CHECK-NEXT:    [[TMP54:%.*]] = insertelement <4 x i8> [[TMP51]], i8 [[TMP53]], i32 2
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE13]]
+; CHECK:       [[PRED_LOAD_CONTINUE13]]:
+; CHECK-NEXT:    [[TMP55:%.*]] = phi <4 x i8> [ [[TMP51]], %[[PRED_LOAD_CONTINUE11]] ], [ [[TMP54]], %[[PRED_LOAD_IF12]] ]
+; CHECK-NEXT:    [[TMP56:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT:    br i1 [[TMP56]], label %[[PRED_LOAD_IF14:.*]], label %[[PRED_LOAD_CONTINUE15:.*]]
+; CHECK:       [[PRED_LOAD_IF14]]:
+; CHECK-NEXT:    [[TMP57:%.*]] = load i8, ptr [[TMP23]], align 1
+; CHECK-NEXT:    [[TMP58:%.*]] = insertelement <4 x i8> [[TMP55]], i8 [[TMP57]], i32 3
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE15]]
+; CHECK:       [[PRED_LOAD_CONTINUE15]]:
+; CHECK-NEXT:    [[TMP59:%.*]] = phi <4 x i8> [ [[TMP55]], %[[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], %[[PRED_LOAD_IF14]] ]
+; CHECK-NEXT:    [[TMP60:%.*]] = lshr <4 x i8> [[TMP43]], splat (i8 1)
+; CHECK-NEXT:    [[TMP61:%.*]] = lshr <4 x i8> [[TMP59]], splat (i8 1)
+; CHECK-NEXT:    [[TMP62:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT:    br i1 [[TMP62]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
+; CHECK:       [[PRED_STORE_IF]]:
+; CHECK-NEXT:    [[TMP63:%.*]] = extractelement <4 x i8> [[TMP60]], i32 0
+; CHECK-NEXT:    store i8 [[TMP63]], ptr [[TMP12]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE]]
+; CHECK:       [[PRED_STORE_CONTINUE]]:
+; CHECK-NEXT:    [[TMP64:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT:    br i1 [[TMP64]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
+; CHECK:       [[PRED_STORE_IF16]]:
+; CHECK-NEXT:    [[TMP65:%.*]] = extractelement <4 x i8> [[TMP60]], i32 1
+; CHECK-NEXT:    store i8 [[TMP65]], ptr [[TMP13]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE17]]
+; CHECK:       [[PRED_STORE_CONTINUE17]]:
+; CHECK-NEXT:    [[TMP66:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT:    br i1 [[TMP66]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
+; CHECK:       [[PRED_STORE_IF18]]:
+; CHECK-NEXT:    [[TMP67:%.*]] = extractelement <4 x i8> [[TMP60]], i32 2
+; CHECK-NEXT:    store i8 [[TMP67]], ptr [[TMP14]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE19]]
+; CHECK:       [[PRED_STORE_CONTINUE19]]:
+; CHECK-NEXT:    [[TMP68:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT:    br i1 [[TMP68]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
+; CHECK:       [[PRED_STORE_IF20]]:
+; CHECK-NEXT:    [[TMP69:%.*]] = extractelement <4 x i8> [[TMP60]], i32 3
+; CHECK-NEXT:    store i8 [[TMP69]], ptr [[TMP15]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE21]]
+; CHECK:       [[PRED_STORE_CONTINUE21]]:
+; CHECK-NEXT:    [[TMP70:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT:    br i1 [[TMP70]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23:.*]]
+; CHECK:       [[PRED_STORE_IF22]]:
+; CHECK-NEXT:    [[TMP71:%.*]] = extractelement <4 x i8> [[TMP61]], i32 0
+; CHECK-NEXT:    store i8 [[TMP71]], ptr [[TMP20]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE23]]
+; CHECK:       [[PRED_STORE_CONTINUE23]]:
+; CHECK-NEXT:    [[TMP72:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT:    br i1 [[TMP72]], label %[[PRED_STORE_IF24:.*]], label %[[PRED_STORE_CONTINUE25:.*]]
+; CHECK:       [[PRED_STORE_IF24]]:
+; CHECK-NEXT:    [[TMP73:%.*]] = extractelement <4 x i8> [[TMP61]], i32 1
+; CHECK-NEXT:    store i8 [[TMP73]], ptr [[TMP21]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE25]]
+; CHECK:       [[PRED_STORE_CONTINUE25]]:
+; CHECK-NEXT:    [[TMP74:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT:    br i1 [[TMP74]], label %[[PRED_STORE_IF26:.*]], label %[[PRED_STORE_CONTINUE27:.*]]
+; CHECK:       [[PRED_STORE_IF26]]:
+; CHECK-NEXT:    [[TMP75:%.*]] = extractelement <4 x i8> [[TMP61]], i32 2
+; CHECK-NEXT:    store i8 [[TMP75]], ptr [[TMP22]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE27]]
+; CHECK:       [[PRED_STORE_CONTINUE27]]:
+; CHECK-NEXT:    [[TMP76:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT:    br i1 [[TMP76]], label %[[PRED_STORE_IF28:.*]], label %[[PRED_STORE_CONTINUE29]]
+; CHECK:       [[PRED_STORE_IF28]]:
+; CHECK-NEXT:    [[TMP77:%.*]] = extractelement <4 x i8> [[TMP61]], i32 3
+; CHECK-NEXT:    store i8 [[TMP77]], ptr [[TMP23]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE29]]
+; CHECK:       [[PRED_STORE_CONTINUE29]]:
+; CHECK-NEXT:    [[PREDPHI:%.*]] = select <4 x i1> [[TMP11]], <4 x i8> [[TMP61]], <4 x i8> zeroinitializer
+; CHECK-NEXT:    [[TMP78:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT:    [[TMP79:%.*]] = zext <4 x i1> [[TMP78]] to <4 x i32>
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[TMP80:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP80]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP79]], i32 3
+; CHECK-NEXT:    br label %[[SCALAR_PH:.*]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT:    [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT:    [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT:    [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT:    br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[IV]]
+; CHECK-NEXT:    [[L_2:%.*]] = load i8, ptr [[GEP]], align 1
+; CHECK-NEXT:    [[UDIV:%.*]] = udiv i8 [[L_2]], 2
+; CHECK-NEXT:    store i8 [[UDIV]], ptr [[GEP]], align 1
+; CHECK-NEXT:    br label %[[LATCH]]
+; CHECK:       [[LATCH]]:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT:    [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 0
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+  %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+  %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+  %l = load i8, ptr %gep.src
+  %c = icmp eq i8 %l, 0
+  br i1 %c, label %latch, label %then
+
+then:
+  %or = or i8 %arg, 1
+  %gep = getelementptr inbounds i8, ptr %dst, i32 %iv
+  %l.2 = load i8, ptr %gep
+  %udiv = udiv i8 %l.2, 2
+  store i8 %udiv, ptr %gep
+  br label %latch
+
+latch:
+  %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+  %cmp = icmp ne i8 %phi, 0
+  %zext = zext i1 %cmp to i32
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv, 18
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 0
+}

@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2026

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

The UDiv fold added in d12e993 (#174581) is currently also applied to replicate regions, which means we may end up with VPInstructions in replicate regions, which is currently nots supported.

Fixes #175295.


Full diff: https://github.com/llvm/llvm-project/pull/175460.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+6-1)
  • (added) llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll (+310)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a430f13f0c9c0..19c66e1efb956 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1352,7 +1352,12 @@ static void simplifyRecipe(VPSingleDefRecipe *Def, VPTypeAnalysis &TypeInfo) {
         {A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())},
         *cast<VPRecipeWithIRFlags>(Def), Def->getDebugLoc()));
 
-  if (match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) && APC->isPowerOf2())
+  // Don't convert udiv to lshr inside a replicate region, as VPInstructions are
+  // not allowed in them.
+  const VPRegionBlock *ParentRegion = Def->getParent()->getParent();
+  bool IsInReplicateRegion = ParentRegion && ParentRegion->isReplicator();
+  if (!IsInReplicateRegion && match(Def, m_UDiv(m_VPValue(A), m_APInt(APC))) &&
+      APC->isPowerOf2())
     return Def->replaceAllUsesWith(Builder.createNaryOp(
         Instruction::LShr,
         {A, Plan->getConstantInt(APC->getBitWidth(), APC->exactLogBase2())}, {},
diff --git a/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
new file mode 100644
index 0000000000000..45f211b9b5284
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/predicated-udiv.ll
@@ -0,0 +1,310 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S %s | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Test case for https://github.com/llvm/llvm-project/issues/175295.
+define i32 @simplify_udiv_1_in_replicate_region(i8 %arg, ptr %src) {
+; CHECK-LABEL: define i32 @simplify_udiv_1_in_replicate_region(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr [[SRC:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i8> poison, i8 [[TMP0]], i64 0
+; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT]], zeroinitializer
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[INDEX]]
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i8, ptr [[TMP2]], i64 4
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP3]], align 1
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT:    [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x i8> zeroinitializer, <4 x i8> [[TMP1]]
+; CHECK-NEXT:    [[TMP5:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT:    [[TMP6:%.*]] = zext <4 x i1> [[TMP5]] to <4 x i32>
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP6]], i32 3
+; CHECK-NEXT:    br label %[[SCALAR_PH:.*]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT:    [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT:    [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT:    [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT:    br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT:    [[UDIV:%.*]] = udiv i8 [[OR]], 1
+; CHECK-NEXT:    br label %[[LATCH]]
+; CHECK:       [[LATCH]]:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT:    [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 0
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+  %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+  %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+  %l = load i8, ptr %gep.src
+  %c = icmp eq i8 %l, 0
+  br i1 %c, label %latch, label %then
+
+then:
+  %or = or i8 %arg, 1
+  %udiv = udiv i8 %or, 1
+  br label %latch
+
+latch:
+  %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+  %cmp = icmp ne i8 %phi, 0
+  %zext = zext i1 %cmp to i32
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv, 18
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 0
+}
+
+define i32 @simplify_udiv_4_in_replicate_region2(i8 %arg, ptr noalias %src, ptr %dst) {
+; CHECK-LABEL: define i32 @simplify_udiv_4_in_replicate_region2(
+; CHECK-SAME: i8 [[ARG:%.*]], ptr noalias [[SRC:%.*]], ptr [[DST:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE29:.*]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP2:%.*]] = add i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP3:%.*]] = add i32 [[INDEX]], 3
+; CHECK-NEXT:    [[TMP4:%.*]] = add i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP5:%.*]] = add i32 [[INDEX]], 5
+; CHECK-NEXT:    [[TMP6:%.*]] = add i32 [[INDEX]], 6
+; CHECK-NEXT:    [[TMP7:%.*]] = add i32 [[INDEX]], 7
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[TMP0]]
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TMP8]], i64 4
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i8>, ptr [[TMP8]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i8>, ptr [[TMP9]], align 1
+; CHECK-NEXT:    [[TMP10:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD]], zeroinitializer
+; CHECK-NEXT:    [[TMP11:%.*]] = icmp ne <4 x i8> [[WIDE_LOAD1]], zeroinitializer
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP0]]
+; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP1]]
+; CHECK-NEXT:    [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP2]]
+; CHECK-NEXT:    [[TMP15:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP3]]
+; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP12]], i32 0
+; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <4 x ptr> [[TMP16]], ptr [[TMP13]], i32 1
+; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <4 x ptr> [[TMP17]], ptr [[TMP14]], i32 2
+; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <4 x ptr> [[TMP18]], ptr [[TMP15]], i32 3
+; CHECK-NEXT:    [[TMP20:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP4]]
+; CHECK-NEXT:    [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP5]]
+; CHECK-NEXT:    [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP6]]
+; CHECK-NEXT:    [[TMP23:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[TMP7]]
+; CHECK-NEXT:    [[TMP24:%.*]] = insertelement <4 x ptr> poison, ptr [[TMP20]], i32 0
+; CHECK-NEXT:    [[TMP25:%.*]] = insertelement <4 x ptr> [[TMP24]], ptr [[TMP21]], i32 1
+; CHECK-NEXT:    [[TMP26:%.*]] = insertelement <4 x ptr> [[TMP25]], ptr [[TMP22]], i32 2
+; CHECK-NEXT:    [[TMP27:%.*]] = insertelement <4 x ptr> [[TMP26]], ptr [[TMP23]], i32 3
+; CHECK-NEXT:    [[TMP28:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT:    br i1 [[TMP28]], label %[[PRED_LOAD_IF:.*]], label %[[PRED_LOAD_CONTINUE:.*]]
+; CHECK:       [[PRED_LOAD_IF]]:
+; CHECK-NEXT:    [[TMP29:%.*]] = load i8, ptr [[TMP12]], align 1
+; CHECK-NEXT:    [[TMP30:%.*]] = insertelement <4 x i8> poison, i8 [[TMP29]], i32 0
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE]]
+; CHECK:       [[PRED_LOAD_CONTINUE]]:
+; CHECK-NEXT:    [[TMP31:%.*]] = phi <4 x i8> [ poison, %[[VECTOR_BODY]] ], [ [[TMP30]], %[[PRED_LOAD_IF]] ]
+; CHECK-NEXT:    [[TMP32:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT:    br i1 [[TMP32]], label %[[PRED_LOAD_IF2:.*]], label %[[PRED_LOAD_CONTINUE3:.*]]
+; CHECK:       [[PRED_LOAD_IF2]]:
+; CHECK-NEXT:    [[TMP33:%.*]] = load i8, ptr [[TMP13]], align 1
+; CHECK-NEXT:    [[TMP34:%.*]] = insertelement <4 x i8> [[TMP31]], i8 [[TMP33]], i32 1
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE3]]
+; CHECK:       [[PRED_LOAD_CONTINUE3]]:
+; CHECK-NEXT:    [[TMP35:%.*]] = phi <4 x i8> [ [[TMP31]], %[[PRED_LOAD_CONTINUE]] ], [ [[TMP34]], %[[PRED_LOAD_IF2]] ]
+; CHECK-NEXT:    [[TMP36:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT:    br i1 [[TMP36]], label %[[PRED_LOAD_IF4:.*]], label %[[PRED_LOAD_CONTINUE5:.*]]
+; CHECK:       [[PRED_LOAD_IF4]]:
+; CHECK-NEXT:    [[TMP37:%.*]] = load i8, ptr [[TMP14]], align 1
+; CHECK-NEXT:    [[TMP38:%.*]] = insertelement <4 x i8> [[TMP35]], i8 [[TMP37]], i32 2
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE5]]
+; CHECK:       [[PRED_LOAD_CONTINUE5]]:
+; CHECK-NEXT:    [[TMP39:%.*]] = phi <4 x i8> [ [[TMP35]], %[[PRED_LOAD_CONTINUE3]] ], [ [[TMP38]], %[[PRED_LOAD_IF4]] ]
+; CHECK-NEXT:    [[TMP40:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT:    br i1 [[TMP40]], label %[[PRED_LOAD_IF6:.*]], label %[[PRED_LOAD_CONTINUE7:.*]]
+; CHECK:       [[PRED_LOAD_IF6]]:
+; CHECK-NEXT:    [[TMP41:%.*]] = load i8, ptr [[TMP15]], align 1
+; CHECK-NEXT:    [[TMP42:%.*]] = insertelement <4 x i8> [[TMP39]], i8 [[TMP41]], i32 3
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE7]]
+; CHECK:       [[PRED_LOAD_CONTINUE7]]:
+; CHECK-NEXT:    [[TMP43:%.*]] = phi <4 x i8> [ [[TMP39]], %[[PRED_LOAD_CONTINUE5]] ], [ [[TMP42]], %[[PRED_LOAD_IF6]] ]
+; CHECK-NEXT:    [[TMP44:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT:    br i1 [[TMP44]], label %[[PRED_LOAD_IF8:.*]], label %[[PRED_LOAD_CONTINUE9:.*]]
+; CHECK:       [[PRED_LOAD_IF8]]:
+; CHECK-NEXT:    [[TMP45:%.*]] = load i8, ptr [[TMP20]], align 1
+; CHECK-NEXT:    [[TMP46:%.*]] = insertelement <4 x i8> poison, i8 [[TMP45]], i32 0
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE9]]
+; CHECK:       [[PRED_LOAD_CONTINUE9]]:
+; CHECK-NEXT:    [[TMP47:%.*]] = phi <4 x i8> [ poison, %[[PRED_LOAD_CONTINUE7]] ], [ [[TMP46]], %[[PRED_LOAD_IF8]] ]
+; CHECK-NEXT:    [[TMP48:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT:    br i1 [[TMP48]], label %[[PRED_LOAD_IF10:.*]], label %[[PRED_LOAD_CONTINUE11:.*]]
+; CHECK:       [[PRED_LOAD_IF10]]:
+; CHECK-NEXT:    [[TMP49:%.*]] = load i8, ptr [[TMP21]], align 1
+; CHECK-NEXT:    [[TMP50:%.*]] = insertelement <4 x i8> [[TMP47]], i8 [[TMP49]], i32 1
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE11]]
+; CHECK:       [[PRED_LOAD_CONTINUE11]]:
+; CHECK-NEXT:    [[TMP51:%.*]] = phi <4 x i8> [ [[TMP47]], %[[PRED_LOAD_CONTINUE9]] ], [ [[TMP50]], %[[PRED_LOAD_IF10]] ]
+; CHECK-NEXT:    [[TMP52:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT:    br i1 [[TMP52]], label %[[PRED_LOAD_IF12:.*]], label %[[PRED_LOAD_CONTINUE13:.*]]
+; CHECK:       [[PRED_LOAD_IF12]]:
+; CHECK-NEXT:    [[TMP53:%.*]] = load i8, ptr [[TMP22]], align 1
+; CHECK-NEXT:    [[TMP54:%.*]] = insertelement <4 x i8> [[TMP51]], i8 [[TMP53]], i32 2
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE13]]
+; CHECK:       [[PRED_LOAD_CONTINUE13]]:
+; CHECK-NEXT:    [[TMP55:%.*]] = phi <4 x i8> [ [[TMP51]], %[[PRED_LOAD_CONTINUE11]] ], [ [[TMP54]], %[[PRED_LOAD_IF12]] ]
+; CHECK-NEXT:    [[TMP56:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT:    br i1 [[TMP56]], label %[[PRED_LOAD_IF14:.*]], label %[[PRED_LOAD_CONTINUE15:.*]]
+; CHECK:       [[PRED_LOAD_IF14]]:
+; CHECK-NEXT:    [[TMP57:%.*]] = load i8, ptr [[TMP23]], align 1
+; CHECK-NEXT:    [[TMP58:%.*]] = insertelement <4 x i8> [[TMP55]], i8 [[TMP57]], i32 3
+; CHECK-NEXT:    br label %[[PRED_LOAD_CONTINUE15]]
+; CHECK:       [[PRED_LOAD_CONTINUE15]]:
+; CHECK-NEXT:    [[TMP59:%.*]] = phi <4 x i8> [ [[TMP55]], %[[PRED_LOAD_CONTINUE13]] ], [ [[TMP58]], %[[PRED_LOAD_IF14]] ]
+; CHECK-NEXT:    [[TMP60:%.*]] = lshr <4 x i8> [[TMP43]], splat (i8 1)
+; CHECK-NEXT:    [[TMP61:%.*]] = lshr <4 x i8> [[TMP59]], splat (i8 1)
+; CHECK-NEXT:    [[TMP62:%.*]] = extractelement <4 x i1> [[TMP10]], i32 0
+; CHECK-NEXT:    br i1 [[TMP62]], label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
+; CHECK:       [[PRED_STORE_IF]]:
+; CHECK-NEXT:    [[TMP63:%.*]] = extractelement <4 x i8> [[TMP60]], i32 0
+; CHECK-NEXT:    store i8 [[TMP63]], ptr [[TMP12]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE]]
+; CHECK:       [[PRED_STORE_CONTINUE]]:
+; CHECK-NEXT:    [[TMP64:%.*]] = extractelement <4 x i1> [[TMP10]], i32 1
+; CHECK-NEXT:    br i1 [[TMP64]], label %[[PRED_STORE_IF16:.*]], label %[[PRED_STORE_CONTINUE17:.*]]
+; CHECK:       [[PRED_STORE_IF16]]:
+; CHECK-NEXT:    [[TMP65:%.*]] = extractelement <4 x i8> [[TMP60]], i32 1
+; CHECK-NEXT:    store i8 [[TMP65]], ptr [[TMP13]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE17]]
+; CHECK:       [[PRED_STORE_CONTINUE17]]:
+; CHECK-NEXT:    [[TMP66:%.*]] = extractelement <4 x i1> [[TMP10]], i32 2
+; CHECK-NEXT:    br i1 [[TMP66]], label %[[PRED_STORE_IF18:.*]], label %[[PRED_STORE_CONTINUE19:.*]]
+; CHECK:       [[PRED_STORE_IF18]]:
+; CHECK-NEXT:    [[TMP67:%.*]] = extractelement <4 x i8> [[TMP60]], i32 2
+; CHECK-NEXT:    store i8 [[TMP67]], ptr [[TMP14]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE19]]
+; CHECK:       [[PRED_STORE_CONTINUE19]]:
+; CHECK-NEXT:    [[TMP68:%.*]] = extractelement <4 x i1> [[TMP10]], i32 3
+; CHECK-NEXT:    br i1 [[TMP68]], label %[[PRED_STORE_IF20:.*]], label %[[PRED_STORE_CONTINUE21:.*]]
+; CHECK:       [[PRED_STORE_IF20]]:
+; CHECK-NEXT:    [[TMP69:%.*]] = extractelement <4 x i8> [[TMP60]], i32 3
+; CHECK-NEXT:    store i8 [[TMP69]], ptr [[TMP15]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE21]]
+; CHECK:       [[PRED_STORE_CONTINUE21]]:
+; CHECK-NEXT:    [[TMP70:%.*]] = extractelement <4 x i1> [[TMP11]], i32 0
+; CHECK-NEXT:    br i1 [[TMP70]], label %[[PRED_STORE_IF22:.*]], label %[[PRED_STORE_CONTINUE23:.*]]
+; CHECK:       [[PRED_STORE_IF22]]:
+; CHECK-NEXT:    [[TMP71:%.*]] = extractelement <4 x i8> [[TMP61]], i32 0
+; CHECK-NEXT:    store i8 [[TMP71]], ptr [[TMP20]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE23]]
+; CHECK:       [[PRED_STORE_CONTINUE23]]:
+; CHECK-NEXT:    [[TMP72:%.*]] = extractelement <4 x i1> [[TMP11]], i32 1
+; CHECK-NEXT:    br i1 [[TMP72]], label %[[PRED_STORE_IF24:.*]], label %[[PRED_STORE_CONTINUE25:.*]]
+; CHECK:       [[PRED_STORE_IF24]]:
+; CHECK-NEXT:    [[TMP73:%.*]] = extractelement <4 x i8> [[TMP61]], i32 1
+; CHECK-NEXT:    store i8 [[TMP73]], ptr [[TMP21]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE25]]
+; CHECK:       [[PRED_STORE_CONTINUE25]]:
+; CHECK-NEXT:    [[TMP74:%.*]] = extractelement <4 x i1> [[TMP11]], i32 2
+; CHECK-NEXT:    br i1 [[TMP74]], label %[[PRED_STORE_IF26:.*]], label %[[PRED_STORE_CONTINUE27:.*]]
+; CHECK:       [[PRED_STORE_IF26]]:
+; CHECK-NEXT:    [[TMP75:%.*]] = extractelement <4 x i8> [[TMP61]], i32 2
+; CHECK-NEXT:    store i8 [[TMP75]], ptr [[TMP22]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE27]]
+; CHECK:       [[PRED_STORE_CONTINUE27]]:
+; CHECK-NEXT:    [[TMP76:%.*]] = extractelement <4 x i1> [[TMP11]], i32 3
+; CHECK-NEXT:    br i1 [[TMP76]], label %[[PRED_STORE_IF28:.*]], label %[[PRED_STORE_CONTINUE29]]
+; CHECK:       [[PRED_STORE_IF28]]:
+; CHECK-NEXT:    [[TMP77:%.*]] = extractelement <4 x i8> [[TMP61]], i32 3
+; CHECK-NEXT:    store i8 [[TMP77]], ptr [[TMP23]], align 1
+; CHECK-NEXT:    br label %[[PRED_STORE_CONTINUE29]]
+; CHECK:       [[PRED_STORE_CONTINUE29]]:
+; CHECK-NEXT:    [[PREDPHI:%.*]] = select <4 x i1> [[TMP11]], <4 x i8> [[TMP61]], <4 x i8> zeroinitializer
+; CHECK-NEXT:    [[TMP78:%.*]] = icmp ne <4 x i8> [[PREDPHI]], zeroinitializer
+; CHECK-NEXT:    [[TMP79:%.*]] = zext <4 x i1> [[TMP78]] to <4 x i32>
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[TMP80:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
+; CHECK-NEXT:    br i1 [[TMP80]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i32> [[TMP79]], i32 3
+; CHECK-NEXT:    br label %[[SCALAR_PH:.*]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 16, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LATCH:.*]] ]
+; CHECK-NEXT:    [[RECUR:%.*]] = phi i32 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[ZEXT:%.*]], %[[LATCH]] ]
+; CHECK-NEXT:    [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
+; CHECK-NEXT:    [[L:%.*]] = load i8, ptr [[GEP_SRC]], align 1
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[L]], 0
+; CHECK-NEXT:    br i1 [[C]], label %[[LATCH]], label %[[THEN:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    [[OR:%.*]] = or i8 [[ARG]], 1
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr inbounds i8, ptr [[DST]], i32 [[IV]]
+; CHECK-NEXT:    [[L_2:%.*]] = load i8, ptr [[GEP]], align 1
+; CHECK-NEXT:    [[UDIV:%.*]] = udiv i8 [[L_2]], 2
+; CHECK-NEXT:    store i8 [[UDIV]], ptr [[GEP]], align 1
+; CHECK-NEXT:    br label %[[LATCH]]
+; CHECK:       [[LATCH]]:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i8 [ [[UDIV]], %[[THEN]] ], [ 0, %[[LOOP]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ne i8 [[PHI]], 0
+; CHECK-NEXT:    [[ZEXT]] = zext i1 [[CMP]] to i32
+; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV]], 18
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i32 0
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %latch ]
+  %recur = phi i32 [ 0, %entry ], [ %zext, %latch ]
+  %gep.src = getelementptr inbounds i8, ptr %src, i32 %iv
+  %l = load i8, ptr %gep.src
+  %c = icmp eq i8 %l, 0
+  br i1 %c, label %latch, label %then
+
+then:
+  %or = or i8 %arg, 1
+  %gep = getelementptr inbounds i8, ptr %dst, i32 %iv
+  %l.2 = load i8, ptr %gep
+  %udiv = udiv i8 %l.2, 2
+  store i8 %udiv, ptr %gep
+  br label %latch
+
+latch:
+  %phi = phi i8 [ %udiv, %then ], [ 0, %loop ]
+  %cmp = icmp ne i8 %phi, 0
+  %zext = zext i1 %cmp to i32
+  %iv.next = add i32 %iv, 1
+  %ec = icmp eq i32 %iv, 18
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i32 0
+}

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'd have never guessed: why are VPInstructions not supported in VPReplicateRegions? I understand that this is probably a design flaw, but I'm interested in some design history?

LGTM, thanks!

@@ -0,0 +1,310 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --filter-out-after "^scalar.ph" --version 5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'd have never guessed: why are VPInstructions not supported in VPReplicateRegions? I understand that this is probably a design flaw, but I'm interested in some design history?

Replicate regions are a bit of a legacy construct which are coupled with replicate (and scalar-iv-steps) recipes since the initial VPlan introduction. Lots of things changed since then, but not yet the handling of replicate regions. Down the line, the plan is to explicitly dissolve replicate regions as well (#170212), eventually completely removing the implicit VPTransformState::Lane. After that, it should be possible to support VPInstructions in replicate regions as well

@fhahn fhahn enabled auto-merge (squash) January 12, 2026 11:41
@fhahn fhahn merged commit 8f18252 into llvm:main Jan 12, 2026
9 of 10 checks passed
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 12, 2026
The UDiv fold added in d12e993 (#174581) is currently also applied to
replicate regions, which means we may end up with VPInstructions in
replicate regions, which is currently nots supported.

Fixes llvm/llvm-project#175295.

PR: llvm/llvm-project#175460
navaneethshan pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Jan 14, 2026
The UDiv fold added in d12e993 (#174581) is currently also applied to
replicate regions, which means we may end up with VPInstructions in
replicate regions, which is currently nots supported.

Fixes llvm/llvm-project#175295.

PR: llvm/llvm-project#175460
(cherry picked from commit 8f18252)
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
The UDiv fold added in d12e993 (llvm#174581) is currently also applied to
replicate regions, which means we may end up with VPInstructions in
replicate regions, which is currently nots supported.

Fixes llvm#175295.

PR: llvm#175460
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LoopVectorize] Assertion `!State.Lane && "VPInstruction executing an Lane"' failed.

3 participants